Rajesh Jayaram | Google Research NYC

I am a Research Scientist at Google NYC in the Algorithms and Optimization Group. I received my PhD in computer science at Carnegie Mellon in the summer of 2021, where I was fortunate to be advised by David Woodruff. Prior to that, I received my bachelor's from Brown University in May of 2017.

Research: I am interested primarily in sublinear algorithms and high-dimensional geometry, specifically sketching, streaming, and distributed algorithms for large scale computational problems. More broadly, I am interested in dimensionality reduction methods: namely, to what extent can we compress the significant components of an enormous, noisy data-set? My work also spans the areas of property testing, machine learning, and optimization.

Google Scholar, DBLP.

Teaching:

I taught as an Adjunct Professor at NYU's Tandon School of Engineering.

Spring 2022: NYU CS-GY 6763 - Algorithmic Machine Learning and Data Science

Workshops:

I co-organized the Workshop on Algorithms for Large Data (Online) (WALDO 2021), which took pace on Monday, August 23 through Wednesday, August 25.

I co-organized the Workshop on Robust Streaming, Sketching, and Sampling in STOC 2021. A full recording of the workshop can be found here.

I am co-organizing an Industry Workshop at FOCS 2024, with the goal of bridging techniques between theory and practice.

Dissertation:

Sketching and Sampling Algorithms for High-Dimensional Data. PhD Thesis, May 2021
Commitee: David Woodruff (advisor), Anupam Gupta (CMU), Andrej Risteski (CMU), Alexandr Andoni (Columbia), Jelani Nelson (Berkeley)

Preprints:

CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
With João Veneroso, Jinmeng Rao, Gustavo Hernández Ábrego, Majid Hadian, and Daniel Cer. [arXiv]

Publications:

Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
With Lorenzo Beretta, Vincent Cohen-Addad, and Erik Waingarten.
FOCS 2025
Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams
With Ainesh Bakshi, Vincent Cohen-Addad, Sameul B. Hopkins, and Silvio Lattanzi.
COLT 2025 [arXiv]
Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
With Jie Gao, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep Silwal, and Erik Waingarten.
ICML 2025 [arXiv]
Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
With Laxman Dhulipala, Lars Gottesbüren, and Jakub Lacki.
VLDB 2025 [arXiv]
Near-Optimal Spectral Density Estimation via Explicit and Implicit Deflation
With Rajarshi Bhattacharjee, Cameron Musco, Christopher Musco, and Archan Ray.
SODA 2025 [arXiv]
Massively Parallel Minimum Spanning Tree in General Metric Spaces
With Amir Azarmehr, Soheil Behnezhad, Jakub Łącki, Vahab Mirrokni, and Peilin Zhong.
SODA 2025 [arXiv]
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
With Laxman Dhulipala, Majid Hadian, Jason Lee, and Vahab Mirrokni.
NeurIPS 2024 [arXiv], [Google Research Blog], [Weaviate Podcast]
Efficient Centroid-Linkage Clustering
With MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N Gowda, D Ellis Hershkowitz and Jakub Łącki.
NeurIPS 2024 [arXiv]
Metric Clustering and MST with Strong and Weak Distance Oracles
With MohammadHossein Bateni, Prathamesh Dharangutte, and Chen Wang.
COLT 2024 [arXiv]
Parallel and Sequential Hardness of Hierarchical Graph Clustering
With Mohammad Hossein Bateni, Laxman Dhulipala, Kishen Gowda, D Ellis Hershkowitz, and Jakub Lacki.
ICALP 2024 [arXiv]
Dynamic PageRank: Algorithms and Lower Bounds
With Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski.
ICALP 2024 [arXiv]
Data-Dependent LSH for the Earth Mover’s Distance
With Erik Waingarten and Tian Zhang.
STOC 2024 [arXiv] [Talk]
HyperAttention: Long-Context Attention in Near-Linear Time
With Insu Han, Amin Karbasi, Vahab Mirrokni, David Woodruff, and Amir Zandieh.
ICLR 2024 [arXiv]
Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
With Vahab Mirrokni, Shyam Narayanan, and Peilin Zhong.
SODA 2024 [arXiv]
Fully Dynamic Consistent k-Center Clustering
With Christoph Grunau, Bernhard Haeupler, Jakub Łącki, and Václav Rozhoň.
SODA 2024 [arXiv]
Streaming Algorithms with Few State Changes
With David Woodruff and Samson Zhou.
PODS 2024 [arXiv]
A Near-Linear Time Algorithm for the Chamfer Distance
With Ainesh Bakshi, Piotr Indyk, Sandeep Silwal, and Erik Waingarten.
NeurIPS 2023 [arXiv]
Streaming Euclidean MST to a Constant Factor
With Vincent Cohen-Addad, Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2023 [arXiv]
Optimal Fully Dynamic k-Centers Clustering
With MohammadHossein Bateni, Hossein Esfandiari, and Vahab Mirrokni.
SODA 2023 [arXiv]
[Merged Paper] with Hendrik Fichtenberger, Monika Henzinger, and Andreas Wiese
Differentially Oblivious Relational Database Operators
With Lianke Qin, Elaine Shi, Zhao Song, Danyang Zhuo, Shumo Chu.
VLDB 2023 [arXiv]
Stars: Tera-Scale Graph Building for Clustering and Learning
With CJ Carey, Jonathan Halcrow, Vahab Mirrokni, Warren Schudy, and Peilin Zhong.
NeurIPS 2022 [arXiv]
New Streaming Algorithms for High Dimensional EMD and MST
With Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2022 [arXiv], [Talk]
Truly Perfect Samplers for Data Streams and Sliding Windows
With David Woodruff and Samson Zhou.
PODS 2022 [arXiv]
An Optimal Algorithm for Triangle Counting in a Stream
With John Kallaugher.
APPROX 2021 [arXiv]
Learning and Testing Junta Distributions with Subcube Conditioning
With Xi Chen, Amit Levi, and Erik Waingarten.
COLT 2021 [arXiv]
In-Database Regression in Input Sparsity Time
With Alireza Samadian, David Woodruff, and Peng Ye.
ICML 2021 [arXiv]
When is Approximate Counting for Conjunctive Queries Tractable?
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
STOC 2021 [arXiv]
Testing Positive Semi-Definiteness via Random Submatrices
With Ainesh Bakshi and Nadiia Chepurko.
FOCS 2020 [arXiv], [Extended Abstract], [Talk @ WOLA'20], [Longer Talk]
A Framework for Adversarially Robust Streaming Algorithms
With Omri Ben-Eliezer, David Woodruff, and Eylon Yogev.
PODS 2020 and Journal of the ACM [arXiv]
PODS Best Paper Award, 2020
Invited to the Journal of the ACM
2021 ACM SIGMOD Research Highlight Award
Invited to HALG 2021
Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
With Qiuyi Zhang and David Woodruff.
ICLR 2020 [arXiv], [Short Talk]
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
With Huaian Diao, Zhao Song, Wen Sun, and David Woodruff.
NeurIPS 2019 [arXiv] [Poster]
Towards Optimal Moment Estimation in Streaming and Distributed Models
With David Woodruff.
APPROX 2019 [arXiv]
Learning Two Layer Rectified Neural Networks in Polynomial Time
With Ainesh Bakshi and David Woodruff.
COLT 2019 [arXiv], [Talk @ COLT]
Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
PODS 2019 and Journal of the ACM [arXiv], [Talk], [SIGMOD Technical Perspective]
PODS Best Paper Award, 2019
Invited to the Journal of the ACM
2021 ACM SIGMOD Research Highlight Award
Weighted Reservoir Sampling from Distributed Streams
With Gokarna Sharma, Srikanta Tirthapura, and David P. Woodruff.
PODS 2019 [arXiv]
Perfect L_p Sampling in a Data Stream
With David Woodruff.
FOCS 2018 and SIAM Journal on Computing [arXiv]
Data Streams with Bounded Deletions
With David Woodruff.
PODS 2018 [arXiv]
Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
With Barna Saha.
ICALP 2017 [Full Version], [Conference Version]